Imagine we’re some fancy data scientists exploring - once again - the gapminder data. We’re particularly interested in the development of the GDP across time and across countries. Some R-fanatics from GESIS recommended using this tidyverse thing in order to complete our tasks. At the same time, they also hesitate to load all of its R-packages at once.
tidyverse for importing Excel data and for data wrangling.
Ok, that wasn’t too hard. But data science is about data, so we have to load in the data.
sheet = "name_of_your_sheet"
Have the data been succesfully imported? They should comprise a tibble of 275 x 53. Furthermore, the income per person for Algeria of the years 1960, 1961, and 1962 should be 1280, 1085, and 856.
select() and by filtering by number with slice().
Let’s say we’re interested in the earliest 10 years of development in all countries and in the most recent 10 years. The idea is that there might be some differences between the early days and the new days of GDP development. At first, we’d like to compute such statistics across all countries. Unfortunately, the data are in the wide format.
gather(). Additionally, you might want to create a more convenient column name for the variable Income per person (fixed 2000 US$) with rename() as its really messy.
Ok, did it work out? There are still a lot of missing values we might get rid of, and the data are not arranged in a proper way. They make the data untidy, distract us and are not part of any mean calculations anyway. For the next upcoming tasks, simply re-use your code and add the next commands with the %>%.
filter() in combination with !is.na.
Nice. Now we got a - more or less - clean dataset for our actual task: calculating the mean values across all countries for each of the first ten years and each of the last ten years. What’s still a little bit distracting is that we still got the values for all years between these two time periods in the data. But we decided that we leave them there for some future analyses. As such, we do all analysis on the fly. Let’s start with the first time period.
GDP across all countries for each of the first ten years.
double you can simply filter the range of years you are interested in.
After this was done, you might know how to do that for the 10 most recent years…
GDP across all countries for each of the last ten years.